Self-Distillation Multi-task Learning Integrating Multi-dimensional Perception for Multimodal Sequential Recommendation
TANG Zhe1, PANG Jifang1,2, XIE Yu1,2, WANG Zhiqiang1,2
1. School of Computer and Information Technology, Shanxi University, Taiyuan 030006; 2. Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, Shanxi University, Taiyuan 030006
Abstract:As an important application scenario of recommendation systems, multimodal sequential recommendation is a research focus in both industry and academia. However, existing multi-task learning approaches for multimodal sequential recommendation fail to fully consider the high-order relationships within modalities and the enhanced effect of short-term sequences of users. Consequently, these approaches exhibit a low degree of personalization due to their weak semantic representations and interest modeling. To address this issue, an approach for self-distillation multi-task learning integrating multi-dimensional perception for multimodal sequential recommendation(SD-MTMP) is proposed. First, based on the extraction of topics from user reviews, high-order semantic correlations in user groups and item collections are modeled respectively by constructing user-topic and item-topic hypergraphs. The topic-aware representations of nodes are generated through hypergraph convolution. Simultaneously, a weighted bipartite graph is built based on the user-item rating matrix to generate rating-aware representations of nodes. Second, a cross-modal self-distillation auxiliary task is designed to achieve semantic alignment by transferring knowledge from topic-aware representations to rating-aware representations. Additionally, a dual-aware attention mechanism is established by comprehensively considering the effects of user ratings and time intervals on short-term sequences to accurately model short-term interests of users. On the basis of the above, a multi-task learning strategy is proposed for multimodal sequential recommendation. It jointly optimizes the recommendation loss and the self-distillation loss, thereby further enhancing the semantic expressiveness of representations and improving recommendation performance. Finally, experiments on three public datasets demonstrate the effectiveness of SD-MTMP.
[1] ZHANG S X, LIU Z T, XU Y, et al. A Physics-Informed Hybrid Multitask Learning for Lithium-Ion Battery Full-Life Aging Estimation at Early Lifetime. IEEE Transactions on Industrial Informatics, 2025, 21(1): 415-424. [2] JIANG S, ZHU G H, WANG Y, et al. Automatic Multi-task Lear-ning Framework with Neural Architecture Search in Recommendations // Proc of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. New York, USA: ACM, 2024: 1290-1300. [3] ZHANG X K, XU B, WU Y L, et al. FineRec: Exploring Fine-Grained Sequential Recommendation // Proc of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA: ACM, 2024: 1599-1608. [4] ZHANG C, HAN Q L, CHEN R, et al. SSDRec: Self-Augmented Sequence Denoising for Sequential Recommendation // Proc of the IEEE 40th International Conference on Data Engineering. Washington, USA: IEEE, 2024: 803-815. [5] ZHANG D, GENG Y L, GONG W W, et al. RecDCL: Dual Con-trastive Learning for Recommendation // Proc of the ACM Web Conference. New York, USA: ACM, 2024: 3655-3666. [6] HIDASI B, KARATZOGLOU A, BALTRUNAS L, et al. Session-Based Recommendations with Recurrent Neural Networks[C/OL].[2025-06-21]. https://arxiv.org/pdf/1511.06939. [7] KANG W C, MCAULEY J. Self-Attentive Sequential Recommendation // Proc of the IEEE International Conference on Data Mining. Washington, USA: IEEE, 2018: 197-206. [8] ZHANG M Q, WU S, YU X L, et al. Dynamic Graph Neural Networks for Sequential Recommendation. IEEE Transactions on Know-ledge and Data Engineering, 2023, 35(5): 4741-4753. [9] WU S, TANG Y Y, ZHU Y Q, et al. Session-Based Recommendation with Graph Neural Networks. Proceedings of the AAAI Confe-rence on Artificial Intelligence, 2019, 33(1): 346-353. [10] DING C X, ZHAO Z Y, LI C, et al. Session-Based Recommendation with Hypergraph Convolutional Networks and Sequential Information Embeddings. Expert Systems with Applications, 2023. DOI: 10.1016/j.eswa.2023.119875. [11] FU C, WANG K, WU J H, et al. Residual Multi-task Learner for Applied Ranking // Proc of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. New York, USA: ACM, 2024: 4974-4985. [12] NI Y B, OU D, LIU S C, et al. Perceive Your Users in Depth: Learning Universal User Representations from Multiple E-Co-mmerce Tasks // Proc of the 24th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. New York, USA: ACM, 2018: 596-605. [13] ZHAO J J, DU B W, SUN L L, et al. Multiple Relational Attention Network for Multi-task Learning // Proc of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM, 2019: 1123-1131. [14] MA X, ZHAO L Q, HUANG G, et al. Entire Space Multi-task Model: An Effective Approach for Estimating Post-Click Conversion Rate // Proc of the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA: ACM, 2018: 1137-1140. [15] WEN H, ZHANG J, WANG Y, et al. Entire Space Multi-task Modeling via Post-Click Behavior Decomposition for Conversion Rate Prediction // Proc of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA: ACM, 2020: 2377-2386. [16] 周俊,胡斌斌,张志强,等. MoGE:基于图上下文增强的多任务推荐算法. 电子学报, 2023, 51(11): 3377-3387. (ZHOU J, HU B B, ZHANG Z Q, et al. MoGE: Graph Context Enhanced Multi-task Recommendation Method. Acta Electronica Sinica, 2023, 51(11): 3377-3387.) [17] HE Y, FENG X, CHENG C, et al. MetaBalance: Improving Multi-task Recommendations via Adapting Gradient Magnitudes of Auxi-liary Tasks // Proc of the ACM Web Conference. New York, USA: ACM, 2022: 2205-2215. [18] LIU Y X, XIA L H, HUANG C, et al. SelfGNN: Self-Supervised Graph Neural Networks for Sequential Recommendation // Proc of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA: ACM, 2024: 1609-1618. [19] XIE X, SUN F, LIU Z Y, et al. Contrastive Learning for Sequential Recommendation // Proc of the IEEE 38th International Conference on Data Engineering. Washington, USA: IEEE, 2022: 1259-1273. [20] WU J N, WANG X, FENG F L, et al. Self-Supervised Graph Lear-ning for Recommendation // Proc of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA: ACM, 2021: 726-735. [21] LIU J X, CHEN S C. TimesURL: Self-Supervised Contrastive Lear-ning for Universal Time Series Representation Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 2024, 38(12): 13918-13926. [22] FU J C, GE X R, XIN X, et al. IISAN: Efficiently Adapting Multimodal Representation for Sequential Recommendation with Decoupled Peft // Proc of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA: ACM, 2024: 687-697. [23] 张晓明,梁正光,姚昌瑀,等. 融合潜在结构与语义信息的多模态推荐方法. 模式识别与人工智能, 2024, 37(3): 231-241. (ZHANG X M, LIANG Z G, YAO C Y, et al. Multimodal Re-commendation Method Integrating Latent Structures and Semantic Information. Pattern Recognition and Artificial Intelligence, 2024, 37(3): 231-241.) [24] CHEN G D, SUN R N, JIANG Y Z H, et al. A Multi-modal Mo-deling Framework for Cold-Start Short-Video Recommendation // Proc of the 18th ACM Conference on Recommender Systems. New York, USA: ACM, 2024: 391-400. [25] GUO Z Q, LI J J, LI G H, et al. LGMRec: Local and Global Graph Learning for Multimodal Recommendation. Proceedings of the AAAI Conference on Artificial Intelligence, 2024, 38(8): 8454-8462. [26] LU J S, BATRA D, PARIKH D, et al. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks // Proc of the 33rd International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2019: 13-23. [27] YU P H, TAN Z Y, LU G M, et al. Multi-view Graph Convolution Network for Multimedia Recommendation // Proc of the 31st ACM International Conference on Multimedia. New York, USA: ACM, 2023: 6576-6585. [28] HU H C, GUO W, LIU Y, et al. Adaptive Multi-modalities Fusion in Sequential Recommendation Systems // Proc of the 32nd ACM International Conference on Information and Knowledge Management. New York, USA: ACM, 2023: 843-853. [29] SUN Z Y, FANG Y, WU T, et al. Alpha-CLIP: A CLIP Model Focusing on Wherever You Want // Proc of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2024: 13019-13029. [30] 张凯涵,冯晨娇,姚凯旋,等. 基于对比学习和语义增强的多模态推荐算法. 模式识别与人工智能, 2024, 37(6): 479-490. (ZHANG K H, FENG C J, YAO K X, et al. Multimodal Reco-mmendation Algorithm Based on Contrastive Learning and Semantic Enhancement. Pattern Recognition and Artificial Intelligence, 2024, 37(6): 479-490.) [31] SHEN Z Q, LIU Z C, QIN J, et al. S2-BNN: Bridging the Gap between Self-Supervised Real and 1-Bit Neural Networks via Guided Distribution Calibration // Proc of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2021: 2165-2174. [32] LIU Q D, WU X, WANG Y J, et al. LLM-ESR: Large Language Models Enhancement for Long-Tailed Sequential Recommendation // Proc of the 38th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2024: 26701-26727. [33] SHUAI J, WU L, ZHANG K, et al. Topic-Enhanced Graph Neural Networks for Extraction-Based Explainable Recommendation // Proc of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA: ACM, 2023: 1188-1197. [34] YANG W, HUO T F, LIU Z Q, et al. Review-Based Multi-intention Contrastive Learning for Recommendation // Proc of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA: ACM, 2023: 2339-2343. [35] MCINNES L, HEALY J, MELVILLE J. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction[C/OL]. [2025-06-21].https://arxiv.org/abs/1802.03426. [36] MCINNES L, HEALY J, ASTELS S. HDBSCAN: Hierarchical Density Based Clustering. The Journal of Open Source Software, 2017, 2(11). DOI: 10.21105/joss.00205. [37] WANG X, HE X N, WANG M, et al. Neural Graph Collaborative Filtering // Proc of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA: ACM, 2019: 165-174. [38] CHEN L H, YANG N, YU P S, et al. Time Lag Aware Sequential Recommendation // Proc of the 31st ACM International Conference on Information and Knowledge Management. New York, USA: ACM, 2022: 212-221. [39] GARG D, GUPTA P, MALHOTRA P, et al. Sequence and Time Aware Neighborhood for Session-Based Recommendations: STAN // Proc of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA: ACM, 2019: 1069-1072. [40] ZHOU X, SUN A X, LIU Y, et al. SelfCF: A Simple Framework for Self-Supervised Collaborative Filtering. ACM Transactions on Recommender Systems, 2023, 1(2): 1-25. [41] YU J L, XIA X, CHEN T, et al. XSimGCL: Towards Extremely Simple Graph Contrastive Learning for Recommendation. IEEE Transactions on Knowledge and Data Engineering, 2024, 36(2): 913-926. [42] WANG J P, ZENG Z Y, WANG Y X, et al. MISSRec: Pre-trai-ning and Transferring Multi-modal Interest-Aware Sequence Representation for Recommendation // Proc of the 31st ACM Internatio-nal Conference on Multimedia. New York, USA: ACM, 2023: 6548-6557. [43] SHUAI J, ZHANG K, WU L, et al. A Review-Aware Graph Con-trastive Learning Framework for Recommendation // Proc of the 45th International ACM SIGIR Conference on Research and Deve-lopment in Information Retrieval. New York, USA: ACM, 2022: 1283-1293. [44] XIONG Y Q, LIU Y Z, QIAN Y, et al. Review-Based Recommendation under Preference Uncertainty: An Asymmetric Deep Lear-ning Framework. European Journal of Operational Research, 2024, 316(3): 1044-1057.